AITopics | separation performance

Collaborating Authors

separation performance

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Unleashing the Power of Natural Audio Featuring Multiple Sound Sources

Cheng, Xize, Wang, Slytherin, Wang, Zehan, Huang, Rongjie, Jin, Tao, Zhao, Zhou

arXiv.org Artificial IntelligenceApr-25-2025

Universal sound separation aims to extract clean audio tracks corresponding to distinct events from mixed audio, which is critical for artificial auditory perception. However, current methods heavily rely on artificially mixed audio for training, which limits their ability to generalize to naturally mixed audio collected in real-world environments. To overcome this limitation, we propose ClearSep, an innovative framework that employs a data engine to decompose complex naturally mixed audio into multiple independent tracks, thereby allowing effective sound separation in real-world scenarios. We introduce two remix-based evaluation metrics to quantitatively assess separation quality and use these metrics as thresholds to iteratively apply the data engine alongside model training, progressively optimizing separation performance. In addition, we propose a series of training strategies tailored to these separated independent tracks to make the best use of them. Extensive experiments demonstrate that ClearSep achieves state-of-the-art performance across multiple sound separation tasks, highlighting its potential for advancing sound separation in natural audio scenarios. For more examples and detailed results, please visit our demo page at https://clearsep.github.io.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2504.17782

Country: Asia (0.28)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.68)

Add feedback

Leveraging Spatial Cues from Cochlear Implant Microphones to Efficiently Enhance Speech Separation in Real-World Listening Scenes

Olalere, Feyisayo, van der Heijden, Kiki, Stronks, Christiaan H., Briaire, Jeroen, Frijns, Johan HM, van Gerven, Marcel

arXiv.org Artificial IntelligenceJan-24-2025

Speech separation approaches for single-channel, dry speech mixtures have significantly improved. However, real-world spatial and reverberant acoustic environments remain challenging, limiting the effectiveness of these approaches for assistive hearing devices like cochlear implants (CIs). To address this, we quantify the impact of real-world acoustic scenes on speech separation and explore how spatial cues can enhance separation quality efficiently. We analyze performance based on implicit spatial cues (inherent in the acoustic input and learned by the model) and explicit spatial cues (manually calculated spatial features added as auxiliary inputs). Our findings show that spatial cues (both implicit and explicit) improve separation for mixtures with spatially separated and nearby talkers. Furthermore, spatial cues enhance separation when spectral cues are ambiguous, such as when voices are similar. Explicit spatial cues are particularly beneficial when implicit spatial cues are weak. For instance, single CI microphone recordings provide weaker implicit spatial cues than bilateral CIs, but even single CIs benefit from explicit cues. These results emphasize the importance of training models on real-world data to improve generalizability in everyday listening scenarios. Additionally, our statistical analyses offer insights into how data properties influence model performance, supporting the development of efficient speech separation approaches for CIs and other assistive devices in real-world settings.

artificial intelligence, machine learning, spatial cue, (17 more...)

arXiv.org Artificial Intelligence

2501.1461

Country:

Europe > Netherlands > South Holland > Leiden (0.04)
North America > United States (0.04)
Europe > Netherlands > South Holland > Delft (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Consumer Health (0.72)
Health & Medicine > Therapeutic Area > Otolaryngology (0.35)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

An Ensemble Approach to Music Source Separation: A Comparative Analysis of Conventional and Hierarchical Stem Separation

Vardhan, Saarth, Acharya, Pavani R, Rao, Samarth S, Jasthi, Oorjitha Ratna, Natarajan, S

arXiv.org Artificial IntelligenceOct-28-2024

Music source separation (MSS) is a task that involves isolating individual sound sources, or stems, from mixed audio signals. This paper presents an ensemble approach to MSS, combining several state-of-the-art architectures to achieve superior separation performance across traditional Vocal, Drum, and Bass (VDB) stems, as well as expanding into second-level hierarchical separation for sub-stems like kick, snare, lead vocals, and background vocals. Our method addresses the limitations of relying on a single model by utilising the complementary strengths of various models, leading to more balanced results across stems. For stem selection, we used the harmonic mean of Signal-to-Noise Ratio (SNR) and Signal-to-Distortion Ratio (SDR), ensuring that extreme values do not skew the results and that both metrics are weighted effectively. In addition to consistently high performance across the VDB stems, we also explored second-level hierarchical separation, revealing important insights into the complexities of MSS and how factors like genre and instrumentation can influence model performance. While the second-level separation results show room for improvement, the ability to isolate sub-stems marks a significant advancement. Our findings pave the way for further research in MSS, particularly in expanding model capabilities beyond VDB and improving niche stem separations such as guitar and piano.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2410.20773

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > Italy > Lombardy > Milan (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Alpes-Maritimes > Nice (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Media > Music (0.94)
Leisure & Entertainment (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Universal Sound Separation with Self-Supervised Audio Masked Autoencoder

Zhao, Junqi, Liu, Xubo, Zhao, Jinzheng, Yuan, Yi, Kong, Qiuqiang, Plumbley, Mark D., Wang, Wenwu

arXiv.org Artificial IntelligenceJul-16-2024

Universal sound separation (USS) is a task of separating mixtures of arbitrary sound sources. Typically, universal separation models are trained from scratch in a supervised manner, using labeled data. Self-supervised learning (SSL) is an emerging deep learning approach that leverages unlabeled data to obtain task-agnostic representations, which can benefit many downstream tasks. In this paper, we propose integrating a self-supervised pre-trained model, namely the audio masked autoencoder (A-MAE), into a universal sound separation system to enhance its separation performance. We employ two strategies to utilize SSL embeddings: freezing or updating the parameters of A-MAE during fine-tuning. The SSL embeddings are concatenated with the short-time Fourier transform (STFT) to serve as input features for the separation model. We evaluate our methods on the AudioSet dataset, and the experimental results indicate that the proposed methods successfully enhance the separation performance of a state-of-the-art ResUNet-based USS model.

separation, signal processing, source separation, (15 more...)

arXiv.org Artificial Intelligence

2407.11745

Country:

Asia > Middle East > Republic of Türkiye (0.14)
Europe > United Kingdom > England > Surrey (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.40)

Industry:

Leisure & Entertainment (0.93)
Media > Music (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

Signal-noise separation using unsupervised reservoir computing

Choi, Jaesung, Kim, Pilwon

arXiv.org Artificial IntelligenceMay-30-2024

Removing noise from a signal without knowing the characteristics of the noise is a challenging task. This paper introduces a signal-noise separation method based on time series prediction. We use Reservoir Computing (RC) to extract the maximum portion of "predictable information" from a given signal. Reproducing the deterministic component of the signal using RC, we estimate the noise distribution from the difference between the original signal and reconstructed one. The method is based on a machine learning approach and requires no prior knowledge of either the deterministic signal or the noise distribution. It provides a way to identify additivity/multiplicativity of noise and to estimate the signal-to-noise ratio (SNR) indirectly. The method works successfully for combinations of various signal and noise, including chaotic signal and highly oscillating sinusoidal signal which are corrupted by non-Gaussian additive/ multiplicative noise. The separation performances are robust and notably outstanding for signals with strong noise, even for those with negative SNR.

noise, noise distribution, separation, (16 more...)

arXiv.org Artificial Intelligence

2404.0487

Country:

Asia > South Korea > Ulsan > Ulsan (0.04)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)
Asia > South Korea > Seoul > Seoul (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

Add feedback

Enhanced Neural Beamformer with Spatial Information for Target Speech Extraction

Guo, Aoqi, Wu, Junnan, Gao, Peng, Zhu, Wenbo, Guo, Qinwen, Gao, Dazhi, Wang, Yujun

arXiv.org Artificial IntelligenceJun-28-2023

Recently, deep learning-based beamforming algorithms have shown promising performance in target speech extraction tasks. However, most systems do not fully utilize spatial information. In this paper, we propose a target speech extraction network that utilizes spatial information to enhance the performance of neural beamformer. To achieve this, we first use the UNet-TCN structure to model input features and improve the estimation accuracy of the speech pre-separation module by avoiding information loss caused by direct dimensionality reduction in other models. Furthermore, we introduce a multi-head cross-attention mechanism that enhances the neural beamformer's perception of spatial information by making full use of the spatial information received by the array. Experimental results demonstrate that our approach, which incorporates a more reasonable target mask estimation network and a spatial information-based cross-attention mechanism into the neural beamformer, effectively improves speech separation performance.

artificial intelligence, information, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2306.15942

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

An efficient encoder-decoder architecture with top-down attention for speech separation

Li, Kai, Yang, Runxuan, Hu, Xiaolin

arXiv.org Artificial IntelligenceMar-30-2023

Deep neural networks have shown excellent prospects in speech separation tasks. However, obtaining good results while keeping a low model complexity remains challenging in real-world applications. In this paper, we provide a bio-inspired efficient encoder-decoder architecture by mimicking the brain's top-down attention, called TDANet, with decreased model complexity without sacrificing performance. The top-down attention in TDANet is extracted by the global attention (GA) module and the cascaded local attention (LA) layers. The GA module takes multi-scale acoustic features as input to extract global attention signal, which then modulates features of different scales by direct top-down connections. The LA layers use features of adjacent layers as input to extract the local attention signal, which is used to modulate the lateral input in a top-down manner. On three benchmark datasets, TDANet consistently achieved competitive separation performance to previous state-of-the-art (SOTA) methods with higher efficiency. Specifically, TDANet's multiply-accumulate operations (MACs) are only 5% of Sepformer, one of the previous SOTA models, and CPU inference time is only 10% of Sepformer. In addition, a large-size version of TDANet obtained SOTA results on three datasets, with MACs still only 10% of Sepformer and the CPU inference time only 24% of Sepformer. Our study suggests that top-down attention can be a more efficient strategy for speech separation.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2209.152

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

On Neural Architectures for Deep Learning-based Source Separation of Co-Channel OFDM Signals

Lee, Gary C. F., Weiss, Amir, Lancho, Alejandro, Polyanskiy, Yury, Wornell, Gregory W.

arXiv.org Artificial IntelligenceMar-15-2023

We study the single-channel source separation problem involving orthogonal frequency-division multiplexing (OFDM) signals, which are ubiquitous in many modern-day digital communication systems. Related efforts have been pursued in monaural source separation, where state-of-the-art neural architectures have been adopted to train an end-to-end separator for audio signals (as 1-dimensional time series). In this work, through a prototype problem based on the OFDM source model, we assess -- and question -- the efficacy of using audio-oriented neural architectures in separating signals based on features pertinent to communication waveforms. Perhaps surprisingly, we demonstrate that in some configurations, where perfect separation is theoretically attainable, these audio-oriented neural architectures perform poorly in separating co-channel OFDM waveforms. Yet, we propose critical domain-informed modifications to the network parameterization, based on insights from OFDM structures, that can confer about 30 dB improvement in performance.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICASSP49357.2023.10096702

2303.06438

Country: North America > United States > Massachusetts (0.04)

Genre: Research Report (0.64)

Industry: Government (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Scaling strategies for on-device low-complexity source separation with Conv-Tasnet

Ali, Mohamed Nabih, Paissan, Francesco, Falavigna, Daniele, Brutti, Alessio

arXiv.org Artificial IntelligenceMar-6-2023

Recently, several very effective neural approaches for single-channel speech separation have been presented in the literature. However, due to the size and complexity of these models, their use on low-resource devices, e.g. for hearing aids, and earphones, is still a challenge and established solutions are not available yet. Although approaches based on either pruning or compressing neural models have been proposed, the design of a model architecture suitable for a certain application domain often requires heuristic procedures not easily portable to different low-resource platforms. Given the modular nature of the well-known Conv-Tasnet speech separation architecture, in this paper we consider three parameters that directly control the overall size of the model, namely: the number of residual blocks, the number of repetitions of the separation blocks and the number of channels in the depth-wise convolutions, and experimentally evaluate how they affect the speech separation performance. In particular, experiments carried out on the Libri2Mix show that the number of dilated 1D-Conv blocks is the most critical parameter and that the usage of extra-dilation in the residual blocks allows reducing the performance drop.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2303.03005

Country:

Europe > Italy > Trentino-Alto Adige/Südtirol > Trentino Province > Trento (0.04)
South America > Uruguay > Maldonado > Maldonado (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.69)

Add feedback

CasNet: Investigating Channel Robustness for Speech Separation

Wang, Fan-Lin, Cheng, Yao-Fei, Lee, Hung-Shin, Tsao, Yu, Wang, Hsin-Min

arXiv.org Artificial IntelligenceOct-27-2022

Recording channel mismatch between training and testing conditions has been shown to be a serious problem for speech separation. This situation greatly reduces the separation performance, and cannot meet the requirement of daily use. In this study, inheriting the use of our previously constructed TAT-2mix corpus, we address the channel mismatch problem by proposing a channel-aware audio separation network (CasNet), a deep learning framework for end-to-end time-domain speech separation. CasNet is implemented on top of TasNet. Channel embedding (characterizing channel information in a mixture of multiple utterances) generated by Channel Encoder is introduced into the separation module by the FiLM technique. Through two training strategies, we explore two roles that channel embedding may play: 1) a real-life noise disturbance, making the model more robust, or 2) a guide, instructing the separation model to retain the desired channel information. Experimental results on TAT-2mix show that CasNet trained with both training strategies outperforms the TasNet baseline, which does not use channel embeddings.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2210.1537

Country: Asia > Taiwan (0.05)

Genre: Research Report > New Finding (0.48)

Industry: Media (0.34)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback